{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Regressão Linear Multivariada - Trabalho\n",
    "\n",
    "## Estudo de caso: Qualidade de Vinhos\n",
    "\n",
    "Nesta trabalho, treinaremos um modelo de regressão linear usando descendência de gradiente estocástico no conjunto de dados da Qualidade do Vinho. O exemplo pressupõe que uma cópia CSV do conjunto de dados está no diretório de trabalho atual com o nome do arquivo *winequality-white.csv*.\n",
    "\n",
    "O conjunto de dados de qualidade do vinho envolve a previsão da qualidade dos vinhos brancos em uma escala, com medidas químicas de cada vinho. É um problema de classificação multiclasse, mas também pode ser enquadrado como um problema de regressão. O número de observações para cada classe não é equilibrado. Existem 4.898 observações com 11 variáveis de entrada e 1 variável de saída. Os nomes das variáveis são os seguintes:\n",
    "\n",
    "1. Fixed acidity.\n",
    "2. Volatile acidity.\n",
    "3. Citric acid.\n",
    "4. Residual sugar.\n",
    "5. Chlorides.\n",
    "6. Free sulfur dioxide. \n",
    "7. Total sulfur dioxide. \n",
    "8. Density.\n",
    "9. pH.\n",
    "10. Sulphates.\n",
    "11. Alcohol.\n",
    "12. Quality (score between 0 and 10).\n",
    "\n",
    "O desempenho de referencia de predição do valor médio é um RMSE de aproximadamente 0.148 pontos de qualidade.\n",
    "\n",
    "Utilize o exemplo apresentado no tutorial e altere-o de forma a carregar os dados e analisar a acurácia de sua solução. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn import preprocessing\n",
    "from math import sqrt\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def predict(row, coefficients):\n",
    "    yhat = coefficients[0]\n",
    "    for i in range(len(row)-1):\n",
    "        yhat += coefficients[i + 1] * row[i]\n",
    "    return yhat\n",
    "\n",
    "def coefficients_sgd(train, l_rate, n_epoch):\n",
    "    coef = [0.0 for i in range(len(train[0]))]\n",
    "    print ('Coeficiente Inicial={0}' % (coef))\n",
    "    for epoch in range(n_epoch):\n",
    "        sum_error = 0\n",
    "        for row in train:\n",
    "            yhat = predict(row, coef)\n",
    "            error = yhat - row[-1]\n",
    "            sum_error += error**2\n",
    "            coef[0] = coef[0] - l_rate * error\n",
    "            for i in range(len(row)-1):\n",
    "                coef[i + 1] = coef[i + 1] - l_rate * error * row[i] \n",
    "        print(('epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error)))\n",
    "    return coef\n",
    "\n",
    "def rmse_metric(actual, predicted):\n",
    "    sum_error = 0.0\n",
    "    for i in range(len(actual)):\n",
    "        prediction_error = predicted[i] - actual[i]\n",
    "        sum_error += (prediction_error ** 2)\n",
    "        mean_error = sum_error / float(len(actual))\n",
    "    return sqrt(mean_error)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "def splitDataset(dataset, splitRatio):\n",
    "    trainSize = int(len(dataset) * splitRatio)\n",
    "    trainSet = []\n",
    "    copy = list(dataset)\n",
    "    while len(trainSet) < trainSize:\n",
    "        index = random.randrange(len(copy))\n",
    "        trainSet.append(copy.pop(index))\n",
    "    return [trainSet, copy]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>fixed acidity</th>\n",
       "      <th>volatile acidity</th>\n",
       "      <th>citric acid</th>\n",
       "      <th>residual sugar</th>\n",
       "      <th>chlorides</th>\n",
       "      <th>free sulfur dioxide</th>\n",
       "      <th>total sulfur dioxide</th>\n",
       "      <th>density</th>\n",
       "      <th>pH</th>\n",
       "      <th>sulphates</th>\n",
       "      <th>alcohol</th>\n",
       "      <th>quality</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7.0</td>\n",
       "      <td>0.27</td>\n",
       "      <td>0.36</td>\n",
       "      <td>20.7</td>\n",
       "      <td>0.045</td>\n",
       "      <td>45.0</td>\n",
       "      <td>170.0</td>\n",
       "      <td>1.0010</td>\n",
       "      <td>3.00</td>\n",
       "      <td>0.45</td>\n",
       "      <td>8.8</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>6.3</td>\n",
       "      <td>0.30</td>\n",
       "      <td>0.34</td>\n",
       "      <td>1.6</td>\n",
       "      <td>0.049</td>\n",
       "      <td>14.0</td>\n",
       "      <td>132.0</td>\n",
       "      <td>0.9940</td>\n",
       "      <td>3.30</td>\n",
       "      <td>0.49</td>\n",
       "      <td>9.5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8.1</td>\n",
       "      <td>0.28</td>\n",
       "      <td>0.40</td>\n",
       "      <td>6.9</td>\n",
       "      <td>0.050</td>\n",
       "      <td>30.0</td>\n",
       "      <td>97.0</td>\n",
       "      <td>0.9951</td>\n",
       "      <td>3.26</td>\n",
       "      <td>0.44</td>\n",
       "      <td>10.1</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7.2</td>\n",
       "      <td>0.23</td>\n",
       "      <td>0.32</td>\n",
       "      <td>8.5</td>\n",
       "      <td>0.058</td>\n",
       "      <td>47.0</td>\n",
       "      <td>186.0</td>\n",
       "      <td>0.9956</td>\n",
       "      <td>3.19</td>\n",
       "      <td>0.40</td>\n",
       "      <td>9.9</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7.2</td>\n",
       "      <td>0.23</td>\n",
       "      <td>0.32</td>\n",
       "      <td>8.5</td>\n",
       "      <td>0.058</td>\n",
       "      <td>47.0</td>\n",
       "      <td>186.0</td>\n",
       "      <td>0.9956</td>\n",
       "      <td>3.19</td>\n",
       "      <td>0.40</td>\n",
       "      <td>9.9</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \\\n",
       "0            7.0              0.27         0.36            20.7      0.045   \n",
       "1            6.3              0.30         0.34             1.6      0.049   \n",
       "2            8.1              0.28         0.40             6.9      0.050   \n",
       "3            7.2              0.23         0.32             8.5      0.058   \n",
       "4            7.2              0.23         0.32             8.5      0.058   \n",
       "\n",
       "   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \\\n",
       "0                 45.0                 170.0   1.0010  3.00       0.45   \n",
       "1                 14.0                 132.0   0.9940  3.30       0.49   \n",
       "2                 30.0                  97.0   0.9951  3.26       0.44   \n",
       "3                 47.0                 186.0   0.9956  3.19       0.40   \n",
       "4                 47.0                 186.0   0.9956  3.19       0.40   \n",
       "\n",
       "   alcohol  quality  \n",
       "0      8.8        6  \n",
       "1      9.5        6  \n",
       "2     10.1        6  \n",
       "3      9.9        6  \n",
       "4      9.9        6  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data =  pd.read_csv('winequality-white.csv',delimiter=';')\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "i=data.get('quality')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>10</th>\n",
       "      <th>11</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.307692</td>\n",
       "      <td>0.186275</td>\n",
       "      <td>0.216867</td>\n",
       "      <td>0.308282</td>\n",
       "      <td>0.106825</td>\n",
       "      <td>0.149826</td>\n",
       "      <td>0.373550</td>\n",
       "      <td>0.267785</td>\n",
       "      <td>0.254545</td>\n",
       "      <td>0.267442</td>\n",
       "      <td>0.129032</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.240385</td>\n",
       "      <td>0.215686</td>\n",
       "      <td>0.204819</td>\n",
       "      <td>0.015337</td>\n",
       "      <td>0.118694</td>\n",
       "      <td>0.041812</td>\n",
       "      <td>0.285383</td>\n",
       "      <td>0.132832</td>\n",
       "      <td>0.527273</td>\n",
       "      <td>0.313953</td>\n",
       "      <td>0.241935</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.413462</td>\n",
       "      <td>0.196078</td>\n",
       "      <td>0.240964</td>\n",
       "      <td>0.096626</td>\n",
       "      <td>0.121662</td>\n",
       "      <td>0.097561</td>\n",
       "      <td>0.204176</td>\n",
       "      <td>0.154039</td>\n",
       "      <td>0.490909</td>\n",
       "      <td>0.255814</td>\n",
       "      <td>0.338710</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.326923</td>\n",
       "      <td>0.147059</td>\n",
       "      <td>0.192771</td>\n",
       "      <td>0.121166</td>\n",
       "      <td>0.145401</td>\n",
       "      <td>0.156794</td>\n",
       "      <td>0.410673</td>\n",
       "      <td>0.163678</td>\n",
       "      <td>0.427273</td>\n",
       "      <td>0.209302</td>\n",
       "      <td>0.306452</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.326923</td>\n",
       "      <td>0.147059</td>\n",
       "      <td>0.192771</td>\n",
       "      <td>0.121166</td>\n",
       "      <td>0.145401</td>\n",
       "      <td>0.156794</td>\n",
       "      <td>0.410673</td>\n",
       "      <td>0.163678</td>\n",
       "      <td>0.427273</td>\n",
       "      <td>0.209302</td>\n",
       "      <td>0.306452</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         0         1         2         3         4         5         6   \\\n",
       "0  0.307692  0.186275  0.216867  0.308282  0.106825  0.149826  0.373550   \n",
       "1  0.240385  0.215686  0.204819  0.015337  0.118694  0.041812  0.285383   \n",
       "2  0.413462  0.196078  0.240964  0.096626  0.121662  0.097561  0.204176   \n",
       "3  0.326923  0.147059  0.192771  0.121166  0.145401  0.156794  0.410673   \n",
       "4  0.326923  0.147059  0.192771  0.121166  0.145401  0.156794  0.410673   \n",
       "\n",
       "         7         8         9         10   11  \n",
       "0  0.267785  0.254545  0.267442  0.129032  0.5  \n",
       "1  0.132832  0.527273  0.313953  0.241935  0.5  \n",
       "2  0.154039  0.490909  0.255814  0.338710  0.5  \n",
       "3  0.163678  0.427273  0.209302  0.306452  0.5  \n",
       "4  0.163678  0.427273  0.209302  0.306452  0.5  "
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = data.values \n",
    "min_max_scaler = preprocessing.MinMaxScaler()\n",
    "x_scaled = min_max_scaler.fit_transform(data)\n",
    "data = pd.DataFrame(x_scaled)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = np.column_stack([data, i])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "train, test = train_test_split(ds, test_size = 0.3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "l = 0.001\n",
    "ep = 50"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Coeficiente Inicial={0}\n",
      "epoch=0, lrate=0.001, error=10184.773\n",
      "epoch=1, lrate=0.001, error=1238.888\n",
      "epoch=2, lrate=0.001, error=1087.924\n",
      "epoch=3, lrate=0.001, error=959.789\n",
      "epoch=4, lrate=0.001, error=850.077\n",
      "epoch=5, lrate=0.001, error=755.439\n",
      "epoch=6, lrate=0.001, error=673.278\n",
      "epoch=7, lrate=0.001, error=601.555\n",
      "epoch=8, lrate=0.001, error=538.651\n",
      "epoch=9, lrate=0.001, error=483.262\n",
      "epoch=10, lrate=0.001, error=434.326\n",
      "epoch=11, lrate=0.001, error=390.967\n",
      "epoch=12, lrate=0.001, error=352.458\n",
      "epoch=13, lrate=0.001, error=318.183\n",
      "epoch=14, lrate=0.001, error=287.623\n",
      "epoch=15, lrate=0.001, error=260.332\n",
      "epoch=16, lrate=0.001, error=235.927\n",
      "epoch=17, lrate=0.001, error=214.074\n",
      "epoch=18, lrate=0.001, error=194.484\n",
      "epoch=19, lrate=0.001, error=176.904\n",
      "epoch=20, lrate=0.001, error=161.112\n",
      "epoch=21, lrate=0.001, error=146.912\n",
      "epoch=22, lrate=0.001, error=134.131\n",
      "epoch=23, lrate=0.001, error=122.617\n",
      "epoch=24, lrate=0.001, error=112.234\n",
      "epoch=25, lrate=0.001, error=102.864\n",
      "epoch=26, lrate=0.001, error=94.398\n",
      "epoch=27, lrate=0.001, error=86.742\n",
      "epoch=28, lrate=0.001, error=79.813\n",
      "epoch=29, lrate=0.001, error=73.535\n",
      "epoch=30, lrate=0.001, error=67.841\n",
      "epoch=31, lrate=0.001, error=62.671\n",
      "epoch=32, lrate=0.001, error=57.973\n",
      "epoch=33, lrate=0.001, error=53.698\n",
      "epoch=34, lrate=0.001, error=49.804\n",
      "epoch=35, lrate=0.001, error=46.254\n",
      "epoch=36, lrate=0.001, error=43.013\n",
      "epoch=37, lrate=0.001, error=40.051\n",
      "epoch=38, lrate=0.001, error=37.341\n",
      "epoch=39, lrate=0.001, error=34.858\n",
      "epoch=40, lrate=0.001, error=32.581\n",
      "epoch=41, lrate=0.001, error=30.490\n",
      "epoch=42, lrate=0.001, error=28.567\n",
      "epoch=43, lrate=0.001, error=26.797\n",
      "epoch=44, lrate=0.001, error=25.165\n",
      "epoch=45, lrate=0.001, error=23.659\n",
      "epoch=46, lrate=0.001, error=22.268\n",
      "epoch=47, lrate=0.001, error=20.980\n",
      "epoch=48, lrate=0.001, error=19.788\n",
      "epoch=49, lrate=0.001, error=18.682\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[2.7171123779655759,\n",
       " 0.32237351462668024,\n",
       " -0.23875554941991181,\n",
       " 0.15948690744073757,\n",
       " 0.29469107537075606,\n",
       " 0.13968512588519358,\n",
       " 0.25888766368686994,\n",
       " 0.14422912859481937,\n",
       " 0.22814769557901651,\n",
       " 0.17435297726779794,\n",
       " 0.032503135309470799,\n",
       " 0.39766219527356844,\n",
       " 5.606197285332839]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "coefficients_sgd(train, l, ep)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
